首页> 外文OA文献 >New bandwidth selection criterion for Kernel PCA: Approach to dimensionality reduction and classification problems
【2h】

New bandwidth selection criterion for Kernel PCA: Approach to dimensionality reduction and classification problems

机译:内核PCA的新带宽选择标准:降维和分类问题的方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background: DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics. Several prediction tools are available based on these techniques. Results: Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies. Conclusion: We propose, evaluate, and compare several mathematical/statistical techniques, which apply feature transformation/selection for subsequent classification, and consider its application in medical diagnostics. Both feature selection and feature transformation perform well on classification tasks. Due to the dynamic selection property of feature selection, it is hard to define significant features for the classifier, which predicts classes of future samples. Moreover, the proposed strategy enjoys a distinctive advantage with its relatively lesser time complexity.
机译:背景:DNA微阵列是用于改进诊断分类,治疗选择和预后评估的潜在强大技术。使用该技术预测癌症结局已有近十年的历史。可以针对已知疾病病例设计疾病类别预测因子,并提供诊断确认或澄清异常病例。此类预测变量的主要输入是具有很多变量和很少观察值的高维数据。这些特征集的降维效果显着加快了预测任务。特征选择和特征转换方法是生物信息学领域中众所周知的预处理步骤。基于这些技术,可以使用几种预测工具。结果:研究表明,调整良好的内核PCA(KPCA)是降低维度的有效预处理步骤,但是可用的KPCA带宽选择方法在计算上非常昂贵。在本文中,我们为KPCA提出了一种新的数据驱动带宽选择准则,该准则与用于核密度估计的最小二乘交叉验证有关。我们提出了一种新的预测模型,该模型具有良好的KPCA和最小二乘支持向量机(LS-SVM)。我们基于9个案例研究估计了新提出的模型的准确性。然后,我们将其性能(根据ROC曲线下的测试集面积(AUC)和计算时间)与其他众所周知的技术进行比较,例如整个数据集+ LS-SVM,PCA + LS-SVM,t检验+ LS -SVM,微阵列预测分析(PAM)和最小绝对收缩和选择算子(Lasso)。最后,我们通过另外两个案例研究,使用现有的KPCA参数调整算法评估了所提出策略的性能。结论:我们提出,评估和比较了几种数学/统计技术,这些技术将特征转换/选择用于后续分类,并考虑了其在医学诊断中的应用。特征选择和特征转换在分类任务上均表现良好。由于特征选择具有动态选择特性,因此很难为分类器定义重要的特征,从而预测未来样本的类别。而且,所提出的策略以其相对较少的时间复杂性而享有独特的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号